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SEMIPARAMETRIC MODELS^ 
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Consider a semiparametric model with a Euclidean parameter 
and an infinite-dimensional parameter, to be called a Banach param- 
eter. Assume; 

(a) There exists an efficient estimator of the Euclidean parameter. 

(b) When the value of the Euclidean parameter is known, there 
exists an estimator of the Banach parameter, which depends on this 
value and is efficient within this restricted model. 

Substituting the efficient estimator of the Euclidean parameter for 
the value of this parameter in the estimator of the Banach parame- 
ter, one obtains an efficient estimator of the Banach parameter for the 
full semiparametric model with the Euclidean parameter unknown. 
This hereditary property of efficiency completes estimation in semi- 
parametric models in which the Euclidean parameter has been es- 
timated efficiently. Typically, estimation of both the Euclidean and 
the Banach parameter is necessary in order to describe the random 
phenomenon under study to a sufficient extent. Since efficient esti- 
mators are asymptotically linear, the above substitution method is a 
particular case of substituting asymptotically linear estimators of a 
Euclidean parameter into estimators that are asymptotically linear 
themselves and that depend on this Euclidean parameter. This more 
general substitution case is studied for its own sake as well, and a 
hereditary property for asymptotic linearity is proved. 

1. Introduction. Estimation of a parameter is not a goal in itself. Typ- 
ically, the purpose is to determine a reliable picture of future behavior of 
a random system. In semiparametric models this means that estimation of 
just the finite-dimensional, Euclidean parameters does not finish the job. 
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The values of the Banach parameters are needed to complete the picture. 
The situation in classical parametric models is similar. Consider linear re- 
gression under normal errors with unknown variance. The regression param- 
eters are the parameters of interest, but the variance of the errors, although 
of secondary interest, is essential to describe the behavior of the dependent 
variable at a particular value of the independent variable, as for instance in 
prediction. In semiparametric linear regression the error distribution with 
mean zero is completely unknown. Again, this distribution is essential in de- 
scribing the behavior of the dependent variable. Therefore, its distribution 
function has to be estimated. This may be done along the following lines: 

1. Estimate the regression parameter vector 9 efficiently using (by now stan- 
dard) semiparametric theory. 

2. Given the true value of the parameter 6, the error distribution function G 
can be estimated efficiently, since the i.i.d. errors can be reconstructed 
from the observations in this case. 

3. Using the estimated value of 9, construct the residuals and instead of the 
i.i.d. errors use these residuals to estimate the Banach parameter G in 
the same way as in step 2. 

The crux of the present paper is that the resulting estimator of G is 
efficient. In fact, for any semiparametric model, we will prove that this ap- 
proach, which is in line with statistical practice, yields an efficient estimator 
of the Banach parameter, provided a sample splitting scheme is applied. 
Since we assume that efficient estimators of 9 are available, we shall focus 
on efficient estimation of the Banach parameter G in the presence of the 
Euclidean nuisance parameter 9. Sample splitting is unnecessary and the 
direct substitution estimator works if the conditional estimator of the Ba- 
nach parameter given 9 depends on in a smooth way. In order to be able 
to estimate G efficiently according to our approach, it is essential in non- 
adaptive cases that in step 1 the Euclidean parameter 9 can be estimated 
efficiently in the semiparametric sense. The Banach parameter needed for 
more complete inference, like the distribution function G of the errors in 
semiparametric linear regression, typically is unequal to the Banach param- 
eter needed in efficient semiparametric estimation of 9, this parameter being 
the score function —d\og[dG{x)/dx)/dx for location in the linear regression 
model. In fact, Klaassen (1987) has shown that 9 can be estimated effi- 
ciently if and only if the efficient influence function for estimating 9 can be 
estimated consistently and \/n-unbiasedly, given 9, and 9 can be estimated 
-y/n-consistently, with n denoting sample size. Of course, this efficient influ- 
ence function depends on the Banach parameter of interest, but typically 
differs from it. 
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To give a more explicit and precise statement of our results, let V be our 
semiparametric model given by 



where is an open subset of M'^ and ^ is a subset of a Banach or preferably 
a Hilbert space (W, (•, ■)'h)- Typically, in a natural parametrization, G would 
be a distribution function and hence an element of a Banach space Lqo- If 
a (7-finite measure would dominate the distributions in Q, then an obvious 
parametrization would be via the corresponding densities g of G, which 
are elements of a Banach space Li. However, via the square roots we 
parametrize by elements of a Hilbert space L2. Therefore, we shall assume 
that ^ is a subset of a Hilbert space or can be identified with it. We are 
interested in estimating a parameter 



where v'.Q ^ B {B Banach space) is pathwise differentiable; see (4.4) for de- 
tails. Estimation has to be based on i.i.d. random variables Xi,X2, ■ ■ ■ ,Xn 
with unknown distribution Pq^g on the sample space {X,A). Let V2{0) 
be the submodel of V where 9 is known. Let the submodel estimator „ be 
an efficient estimator of v within V2{d). Suppose that we also have an esti- 
mator 9n of 9 at our disposal within V. Following step 3 above, an obvious 
candidate for estimating z/ in the full model V would be the substitution 
estimator Ox . We shall show that a split-sample modification of is 

an efficient estimator of u vaV if 9n is an efficient estimator of in "P. In 
adaptive cases, for ^ to be efficient in V it is sufficient that the estimator 

dn be -y/n-consistent. The substitution estimator ^ itself is semiparamet- 
rically efficient if the submodel estimator Dg^n depends smoothly on 9, which 
is typically the case. 

The asymptotic linearity of the efficient estimators involved warrants the 
resulting substitution estimator to be asymptotically linear as well. We study 
this hereditary property of asymptotic linearity of estimators for its own sake 
in Section 2, where we refrain from the efficiency assumptions made above. 
In Section 3 we discuss such simple examples as the sample variance and 
estimators of the standardized error distribution in linear regression. There 
we will also introduce models that we propose to call parametrized linkage 
models. In Section 4 we will collect some results about efficient influence 
functions in the various (sub)models that we consider. Section 5 contains 
our main results for efficiency. In Section 6 we will discuss a number of 
examples. 

A general class of semiparametric models V = {Pq^g - 9 £ @,G G Q} in 
which our results apply is the class of models that can be handled by profile 
likelihood. If lni9, G) is the appropriately defined likelihood of n independent 
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observations from then a maximum hkehhood estimator On of 6 can be 
found by maximizing the profile likelihood 

(1.3) pln{e) = sup ln{9,G). 

G&g 

This amounts to maximizing the hkehhood in two steps. First maximize 
with respect to G for a given 9. The maximizer of ln{9,G) with respect 
to G, say Gn{9), wih generaUy depend on 9. Placing the submodel es- 
timator Gn{9) back into the likelihood, we obtain a function of 9 only, 
pln{9) = ln{9,Gn{9)). Murphy and van der Vaart (2000) show that the pro- 
file likelihood can to a large extent be viewed as an ordinary likelihood. 
In particular, under some regularity conditions, asymptotic efficiency of 
the maximizer 9^, of (1.3) can be proved. Important in this construction 
is the fact that the maximizer of the likelihood with respect to G, ob- 
tained in the first maximization step, Gn{9), is not yet a complete esti- 
mator of G. This submodel estimator is only an estimator of G for a given 
value of 9, just as in step 2 of our linear regression example above. Hav- 
ing found an efficient estimator of 9, estimation of {9, G) is then completed 
by considering the obvious substitution estimator G„ = Gn{9n)- The esti- 
mator Gn{9) for given 9 is already available as a result of the maximizing 
step in (1.3). The Banach parameter v{G) or G itself will not generally be 
estimable at -^/n-rate, but it may be possible to estimate real-valued func- 
tional K = n{Pe,G) = ^{G) of G at -^/n-rate. In cases where h{Gn{9)) is 
an efficient estimator of R{G) given 0, our results can be applied to yield a 
fully efficient estimator R{Gn{9n)) of R{G). Numerous examples fall into this 
class, some of them treated in some detail in this paper, like the Cox pro- 
portional hazards model for right censored data (Example 6.6) and for cur- 
rent status data [Huang (1996) and Bolthausen, Perkins and van der Vaart 
(2002)], frailty models [Nielsen, Gill, Andersen and S0rensen (1992)], the 
proportional odds model [Murphy, Rossini and van der Vaart (1997)], selec- 
tion bias models [Gilbert, Lele and Vardi (1999) and Cosslett (1981)] and 
random effects models [Butler and Louis (1992)]. 

We will consider a number of examples in more detail in Section 6, namely 
estimation of the variance with unknown mean, estimation of the error dis- 
tribution in parametrized linkage models and in particular in the location 
problem with the bootstrap as an application, estimation of a (symmetric) 
error distribution in linear regression as an example of the adaptive case, 
and finally, estimation of the baseline distribution function in the Cox pro- 
portional hazards model. 

The framework of the present paper has been presented in Klaassen and Putter 
(1997) within the linear regression model with symmetric error distribution 
and has been used by Miiller, Schick and Wefelmeyer (2001) in their discus- 
sion of substitution estimators in semiparametric stochastic process models. 
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There are fundamental theorems of algebra, arithmetic and calculus. Statis- 
tics has its fundamental rule of thumb. It states that "replacing unknown 
parameters in statistical procedures by estimators of them yields appropri- 
ate procedures." This paper describes a large class of estimation problems 
where this rule of thumb is indeed a theorem. 



2. Heredity of asymptotic linearity of substitution estimators. In this 
section we will study the local asymptotic behavior of estimators that are 
obtained by combining two asymptotically linear estimators in the way de- 
scribed in Section 1. We will prove the hereditary property that under cer- 
tain regularity conditions the resulting estimators are asymptotically linear 
as well and we will describe their influence functions. The main application 
of this heredity result is to efficient estimators as described in Section 1. 
This will be pursued in Section 5, but we believe the hereditary property is 
of independent interest as well. 

Although we will apply this hereditary property to semiparametric models 
V as in (1.1), we will be able to restrict attention in the present section to 
parametric models since the phenomenon under study occurs within the 
natural parametric submodels Vi{G) = {Pe,G - £ Q} oiV with G £ Q fixed. 

So, within this section, let V = {Pg : G 0}, C M^' open, be a parametric 
model, and let Xi,X2, ■ ■ ■ ,Xn be the i.i.d. random variables with distribu- 
tion Pg £V on the sample space [X^A) that are used for estimation. Since 
our considerations are of the usual local asymptotic type, we introduce an 
arbitrary fixed 9q£Q at which the local asymptotics is focused. 

For every m € N let be the set of all measurable functions ij: from 
^ X e into such that / '4){x]e) dPe{x) =0 and j \'4){x]e)\'^ dPe{x) < oo 
for all 9 £ Q, where | • | denotes a Euclidean norm. Fix m G N and consider 
a differentiable function k from Q into M"*. 



Definition 2.1. An estimator kn of k{9) is locally asymptotically linear 
at ^0 if there exists a tp £ such that 



(2.1) 



■\/n 



kn-n{9n)-n ^^1p{Xi;9n) 



i=l 



'0 



for all sequences {9n} with {^/n{9n — 9o)} bounded. We call ip the influence 
function of kn- 

Suppose we have an estimator 9n = tn{Xi, . . . ,Xn), tn'-X^ ^ R'^, A^- 
Borel measurable, that is a locally asymptotically linear estimator of 9 at 
^0 with influence function -^g G ^'fc, that is. 



(2.2) 



1 



On — On 2^ i'e{Xi]9n, 

n ^ 



i=l 



'0 
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holds for all sequences with {^/n{0n — Oq)} bounded. Suppose further- 
more that there is a process kg^n = kn{Xi, . . . , X^, 0) that is locally asymp- 
totically linear in ip,^ S around k{9) such that 



(2.3) Vn 
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1=1 

holds for all sequences {On} with {^/n{6n — 9q)} bounded. Note that we have 
extended here the concept of local asymptotic linearity from estimators of 
k;(^), as in Definition 2.1, to statistics indexed by 9. This is quite reasonable 
since the gist of the concept is that the relevant statistic behaves as an 
average locally asymptotically. 

We want to describe the local asymptotic behavior of the substitution 
estimator 

(2.4) kn,i = k^^^^, 

which replaces the unknown 9 by its estimator 9n in kg,n- 
Heuristically, by (2.3), \^{kg ^ — k{9q)) behaves like 

(2.5) V^(^K{9n) - k{9o) + J2 MX^■,9n)j ■ 

Now it is natural to assume the existence of a matrix- valued function c : G ^ M 
that is continuous at 9o and that is such that for every sequence {9n} with 
{\/n{9n — 9)} bounded, 

1 A , „ , 1 



(2.6) 



^j:Mx^;9n)-^j:Mx. 



i;9o)-c{9o)V^{9n-9o) 



holds. Since k is differentiable, ^/n{kQ — k{9q)) would then behave like 



/ n N 

K'{9o){9n - 9o) + J2 MX^■,9o) + c{9o){k - 9o) 



and hence, by (2.2), like 
1 " 

— Y^{^l:,{Xi■9o) + iK'{9o) + c{9o))MXi;9o)). 



1=1 



The estimator k^ ^ thus inherits its asymptotic linearity from the sub- 
model estimator kg^n and the estimator 9n- To study this asymptotic linear- 
ity more carefully, we first describe a sample splitting procedure, for which 
we can prove statements under minimal conditions. Fix a sequence of inte- 
gers {Xn}^=i, such that 

, . A. 1 
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We split the sample (Xi , . . . , Xn) into two parts, {Xi , . . . , Xx^ ) and {Xx„+i , ■ ■ ■ , Xn) 
Define 



— tx„ {Xl ,Xx 



7n2 



^^■^^ 41 =kxjX,,..., Xx„ ; e), kfl. = K^x„ . . . , X„; 0) 



and 
(2.9) 



A„,(i) n-A,i,(2) 



The following theorem describes the influence function of this split-sample 
substitution estimator k„ 2 • 



Theorem 2.1. Fix 9q G 0. Suppose that >]R"^ is continuously 

differentiable in a neighborhood of 9q with derivative matrix k' and suppose 
that conditions (2.2), (2.3) and (2.6) hold for some c:Q ^ W^^^ that is 
continuous at 6q. Then kn^2 defined by (2.7)-(2.9) is locally asymptotically 
linear for k at 6q with influence function ip given by 

(2.10) V^(x; 9) = 0) + {K'{e) + c{9))^e{x; 9), 

that is, for every sequence {9n} with {y/n{9n — 9q)} bounded, 



(2.11) 



^,2-^(^n)--V^(Xi;0„) 



0. 



Proof. Fix 0o £ © and the sequence {6'„}. Take another sequence 
such that {-v/n(^n — ^o)} stays bounded. Combining (2.3), with 9n replaced 
by and (2.6), both with 9n and with replaced by we obtain, using 
(Xl, . . . ,Xa„), 



<^n) - ^ E V'K(^i; ^n) - c(^o)(^n " ^n) 



5o, 



which by continuous differentiability of k(-) and continuity of c(-) at yields 



(2.12) ^ 



41 -< 



An 



— Y.^^{X,-9n) - {^'{en)+c{9n)){9n-9r, 



By the asymptotic linearity of we have 

1 



(2.13) 



'n2 — Pn 



n - Ar- 



j=A„+l 
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Hence, by the independence of {Xi, . . . ,Xx^) and {Xx„+i, ■ ■ ■ ,Xn), (2.12) 
and (2.13) together yield 



(1) 

7l2 5-^77 



1 



n - A„ , 



i=l 

n 

^ {K'{en) + c{en))MXi]en] 



Similarly we obtain 



i=\„+l 



n- \n 



J2 MXilOn) 



i=X„+l 



i=l 



These last two statements yield 



(2.14) 



,,2 - k((9„) ^'0K(^i;6*„,) 

^ i=l 

X {K'{en) + c{en))Mx^■,en) 



In view of (2.7) this shows that kn^2 is a locally asymptotically linear esti- 
mator of Hi, with influence function given by (2.10). This proves the theorem. 

□ 



Note that the expression within braces in (2.14) reveals why (2.7) is crucial 
to our sample splitting scheme. 

To establish local asymptotic linearity of the direct substitution estima- 
tor kn.i without sample splitting [cf. (2.4)], we need locally asymptotically 
uniform continuity in 9 at of the estimators kg^n as follows: 

For every 5 > 0, e > and c > 0, there exist C > and no G N such that 
for all n > no 

(2.15) PgJ sup Vn\ke^ri- f^gj>£) <S. 
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Theorem 2.2. Fix 6q G 6. // (2.15) holds in the situation of Theo- 
rem 2.1, then the substitution estimator kn,i = ^ is a locally asymptoti- 
cally linear estimator of k with influence function ip given by (2.10). 

Proof. Fix 6q Q, 5 > 0, e > 0. Choose c and no such that for n>nQ 

(2.16) PeAV^\On-Oo\>c)<6. 

Now, choose (" sufficiently small such that (2.15) holds too (increase no if 
necessary), and such that the matrix norm of k.'{9q) -\- c{9q) satisfies 

(2.17) \W{9o) + c{9o)\\C<e/2. 

Let 9n{C) be the efficient estimator 9n discretized via a grid of meshwidth 
2{kn)-^/^C, such that 

(2.18) V^lLiC) - 9n\ < C a.s. 

It follows by (2.18) and (2.15) that for n > hq the inequality 



1^ 

n 



1=1 



J2[M^r;0n) + iK{9n) + c(0„))^0(X,; 0„)] 



>4e 



(2.19) 



-(K'(0„)+c(^„))(4(C)-^n)| 

+ (K'(0„) + c(a„))(4(C)-4) 

+ (k'(0„) + C(a„))<^ 9n - 9n - -Y^i)e{X,-9r^ 

<Pe,XM^n-9^\>c)^b 



>4e 



+ E 

eegc,v^|9-6»o|<c+c 



i=l 

-{K'{9n)+c{9n)){9-9n) 



> £ 
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+ Peo{\Wi9n) + c{9n)\\C>e 

W{en)+c{9n)\\V^ 



do 



1 

n 

1=1 



> e 



holds. In view of (2.16), in view of the boundedness of the number of terms 
in the sum with all terms converging to zero by (2.12), in view of (2.17) and 
the continuity of k' + c, and in view of tli6 liriGarity of Q^i 

[see (2.2)], the 

limsup as n — > oo of the right-hand side of (2.19) equals at most 25. Since 
b may be chosen arbitrarily small, this proves the asymptotic linearity. □ 



Remark 2.1. In some cases, it may happen that k'(-) + c(-) =0. Then it 
is easily seen that the influence function of the substitution estimators 
and K„^2 is given by V'(-; •) = ''/'«(■; ")) even if 9n is not locally asymptotically 
linear but is just y^-consistent. 

Remark 2.2. If -(/'«;(•; ^) is differentiable in 9 with derivative iP_k{-;9), 
then Taylor expansion and the law of large numbers suggest c{9) = Eq'iI:^[Xi]9). 
Furthermore, differentiation of Eq'iI)^[Xi]9) = with respect to 9 hints at 

(2.20) c{9) = EeMXi;9) = -EeMXi;9)F {Xi-9), 

with l{x;9) the score function for 0, namely dlogp{x;9)/d9. 

Remark 2.3. Theorem 2.2 is related to a result known as the delta 
method; see Section 2.5 of Lehmann (1999). Given the function k(-), choose 
K0,n = n{9). Then the convergence (2.3) holds trivially with •) = 0. Fur- 
thermore, (2.6) is valid with c(-) = and (2.15) holds if n is continuously 
differentiable. Now Theorem 2.2 states that the local asymptotic linearity 
of 9n in (2.2) implies the local asymptotic linearity of k(0^), that is. 



1 " 

^{On) - l^iOn) - - E K'{0n)MXu9n: 

n ^ 

1=1 



and hence by the central limit theorem the asymptotic normality of y/n{K{9n) 
k{9o)) under Pq^. Note that the delta method states that asymptotic nor- 
mality of y/n{9n — 9q) implies asymptotic normality of y/n(n{9n) — k{9q)). 



Remark 2.4. Under different sets of regularity conditions, the hered- 
ity of asymptotic normality of substitution statistics has been proved by 
Randies (1982) and Pierce (1982). Since asymptotic linearity implies asymp- 
totic normality, both our conditions and our conclusions in proving heredity 
of asymptotic linearity are stronger than needed for heredity of asymptotic 
normality. However, the approach via differentiability in Section 3 of Randies 
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(1982) comes pretty close to the assumption of asymptotic linearity. More- 
over, our ultimate goal is the study of efficient estimators, which are bound 
to be asymptotically linear. 

Let us now discuss sufficient conditions for (2.6). The following standard 
result will be quite helpful and may be verified by studying first and second 
moments. 

Lemma 2.1. Fix Oq G 6. // 

(2.21) Ee,{\MX,;eo + e)-MXi;eo)\'')^0 ase^O 

holds and the map e E0^^'iIji^(Xi; 6q + e) is differentiable at with derivative 
matrix c{9o), then (2.6) holds for all sequences {On} with {y/n(6n — Oq)} 
bounded. 

Sometimes (2.6) may be verified by a direct application of the following 
"law-of-large- numbers" -type of result. 

Lemma 2.2. Fix 9o G ©. For Pg^-almost all x, let 'iPk{x;9) be continu- 
ously differentiable in 9 with derivative 'ipKix',9). If 

(2.22) Ee,\MXi;9)\^Ee,\iP^iXi;9o)\ as9^9o, 
then for {y/n{9n — ^o)} bounded 

1 " Pfi 

(2.23) -^y^{MXu9n)-MXi;9o)}-M0n-9oVEeoMXi;9o) 

v'^ i=l 

holds. 

Proof. Write the left-hand side of (2.23) as 
V^{9n - 9o) 

/■I 1 " . 

+ / -Y.{^-iXi-9o + aOn-9o))-MXi;9o)]dC 
note that the first absolute moment of the last term may be bounded by 

V^\9n-9o\ r Ee, \MXi;9o + c(9n - 9o)) - MXi;9o)\ dQ 
Jo 

and apply, for example, Theorem A. 7.2 of Bickel, Klaassen, Ritov and Wellner 
(1993). □ 



^ J2{MX^■,9o) - Ee,MXf,9o)} 



12 



C. A. J. KLAASSEN AND H. PUTTER 



Condition (2.6) may be also derived via regularity of V and local asymp- 
totic normality (LAN) by an argument similar to the one leading to (2.1.15) 
of Proposition 2.1.2, pages 16 and 17, of Bickel, Klaassen, Ritov and Wellner 
(1993). 

Definition 2.2. A parametric model V = {Pq : G 6}, 6 C ffi*^ open, is 
a fc-dimensional regular parametric model if there exists a a-finite dominat- 

1/2 

ing measure ^ such that, with p{9) = dPe/dfj,, s{9) =Pg '■ 

(i) for all G there exists a fc-vector l{9) of score functions in L2{Pe) 
such that 

(2.24) s{e) = s{9) + 1(9- 9)^i{9)s{9) + o{\9 - 9\) 

in L2(/i) as \9 — 9\^ 0; 

(ii) for every 9 ^ Q the kxk Fisher information matrix / l{9)l^ {9)p{9) dfj, 
is nonsingular; 

(iii) the map 9 l{9)s{9) is continuous from to L^/u). 

A priori, it would have been more general if condition (i) of Definition 2.2 
had prescribed Frechet-differentiability of s{9) with derivative s{9) in L^/i). 
However, it can be shown that all components of s{9) would vanish then al- 
most everywhere where s{9) vanishes. Consequently, s(9) may be written as 
l{9)s{9)/2 in L2(^); see Proposition A.5.3.F of Bickel, Klaassen, Ritov and Wellner 
(1993). 

This approach to prove (2.6) through regularity and local asymptotic nor- 
mality has been implemented in a preprint of the present paper [Klaassen and Putter 
(2000)]. However, a much nicer argument has been noted by Schick (2001). 

Lemma 2.3. Suppose that the model V is regular and fix 9q G Q. If ipn G 
satisfies the continuity condition 

(2.25) \\i;^{-,~9)s{9) - ^lJ^{■■9)s{9)\\^. ^ 0, 
as 9^9, then (2.6) is valid with c{9) given by (2.20). 

Proof. Since the regularity of V implies Hellinger differentiability at 9o, 
Theorem 2.3 of Schick (2001) may be applied and yields (2.6). The continuity 
of c(-) is implied by (2.25) and the regularity of V. □ 

Remark 2.5. At the end of his Section 1 on page 17, Schick (2001) 
refers to (3.5) of the preprint Klaassen and Putter (2000). This is just (2.6) 
of the present version of this paper. 
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3. Examples for asymptotic linearity of substitution estimators. Although 
the results of Section 2 are stated within a parametric model, most of the 
applications we have in mind (in particular efficiency as discussed in Sec- 
tion 5) are in the context of semiparametric models where the interest is 
in a functional of the infinite-dimensional parameter only. In the analysis 
of these applications it suffices to study parametric submodels where the 
infinite-dimensional parameter is fixed. Hence the results of Section 2 are 
also applicable in this context. In order to illustrate the heredity of asymp- 
totic linearity of substitution estimators in the framework of semiparametric 
models, however, we need to introduce some notation and conventions spe- 
cific to semiparametric models. 

Let V = {Pb,g - 9 Q,G G}, B C M'^ open, G CTl,he our semiparametric 
model (1.1). The model V might be parametric in the sense that G is Eu- 
clidean. We may represent the elements of V by the square roots s{6,G) = 
p^^'^{9,G) of their densities p{6,G) with respect to a cr-finite dominating 
measure fi if such a dominating measure exists on the sample space {X,A). 

By keeping G fixed and by varying 9 over we get a parametric submodel 
of V, denoted by Vi =Vi{G). Often 'Pi(G) will be a regular parametric 
model in the sense of Definition 2.2. The fc- vector of score functions of Pi(G) 
will be denoted by li then and in particular we will have 

(3.1) 8(9, G) = 8(9, G) + ^{9- 9)^h{9, G)s{9, G) + o{\9 - 9\). 

Let Xi, . . . be an i.i.d. sample from Pq^g ^ ^ ^'^^ Ist k:V ^ M*" be 
an unknown Euclidean parameter of the model V with 

(3.2) K{P~^^^) = K{Pe,G) = KG), 9,9ee,GeG, 

for some R,:G ^ M™. Since interest is mainly in estimating a Banach pa- 
rameter h' = ^{G) G as in (1.2), a typical choice of k with m = l would be 
i^iPe.c) = b*D{G) for some h* £ B* , the dual of B; note that such a parameter 
K is independent of in the sense of (3.2). Let kn = kn{Xi, . . . ,Xn) be an 
estimator of k with kn ■ — > an ^"-Borel measurable function. As in 
Definition 2.1, the estimator kn of ^[Pq^g) = k[G) is called locally asymp- 
totically linear at Pe^^G if there exists a measurable function tp{-]-,G) E 
such that 



(3.3) 



1 " 

k{G) - -Y,i,{Xi-9n.G) 



n ' , 

1=1 



holds for all sequences {^n} with {^/n{9n — ^o)} bounded. The function 
il^{-;9, G) is called the influence function of k„ at Pe^G and •, G) is called 
influence function as well. 

The results of Section 2 are illustrated in the following examples. 



14 



C. A. J. KLAASSEN AND H. PUTTER 



Example 3.1 (Sample variance). Let Xi, . . . ,Xn be i.i.d. with distribu- 
tion function G{- — 9) on M. Here G is an unknown distribution function 
with mean zero and finite fourth moment. Given 6, a good estimator of the 
variance k(G') = J x'^ dG{x) of G is ko^n = Yl^=i{^i ~ which is hnear 
with infiuence function 

(3.4) ^^{x-e,G) = {x-ef-K{G). 

Since 9 can be estimated by the sample mean — which is linear and 
hence asymptotically linear, Theorem 2.2 yields the sample variance 

1 " 

(3.5) f^e^,n = Sl = -Y.(X,-Xn? 

as a locally asymptotically linear estimator of k(G) in case 9 is unknown; 
note that (2.15) holds in view of the law of large numbers. The sample 
variance is adaptive in the sense that it has the same infiuence function 
as in (3.4) because (2.6) holds with c{9) = 0, as may be verified easily. 

Of course this estimator is the prototype of a substitution estimator, used 
routinely to the extent that typically it is not recognized as a substitution 
estimator. 

Example 3.2 (Parametrized linkage models). Observe realizations of 
Xi, i = 1, . . . ,n, that are i.i.d. copies of X. In many statistical models the 
random variable X is linked to an error random variable e with distribution 
function G. This linkage is parametrized by G O C M'^ and may be described 
by a measurable map : A" — > M with 

te{X) = e. 

The prime example is the linear regression model with 

ig(x) = y- e'^z, X = (y, z'^)'^,y G M, z G R'',Ee = 0, 

yielding the error random variable e and 

te{x) = {y- u^z)/a, 9 = {u^,a)^,x = {y, z^)^, y G M, z G R''^\ 

with Ee = 0, Ee^ = Etg{X) = 1, generating the standardized random vari- 
able e. Another example is the accelerated failure time model with 

te{x) = e-<^^'y, x = (y, z'^ )^ ,y e [0,^), z G M^ 

yielding the standardized life time random variable e. Recall that the dis- 
tribution of X is denoted by Pq^g- The Euclidean parameter is 9 and the 
error distribution function or the standardized life time distribution function 
^{Pq q) = G could be the Banach parameter of interest. Given 9, an obvi- 
ous estimator of G would be the empirical distribution function of tg{Xi), 
i = 1, . . . ,n. 
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We will study estimation of the one-dimensional parameter 

K{Pg^G) = ~<G)= I hdG, 

where h is some known function with ^ h? dO < oo. Taking the empirical 
distribution function of te^Xi) as an estimator of G when 6 is known, we 
obtain 

1 " 

" i=i 

as an estimator of n{Pe,G)- This estimator is linear and hence locally asymp- 
totically linear in the sense of (2.3) in the influence function 

i:^{x]e,G) = h{te{x))-K{Pe,G), xeX. 

If On is a locally asymptotically linear estimator of 6 with influence function 
i/jg{-;6,G) G as in (2.2), an application of Lemma 2.1 yields the validity 
of Theorem 2.1 provided 

Ee,,Gi\h{te,+e{X)) - h{tg^iX))\^) ^0 as e ^ 

holds and EQ^^^Gh{te(,+e{X)) is differentiable in e at with a derivative ma- 
trix c{9q) that is continuous in ^o- Noting that {0) from (2.10) vanishes 
here, we see that the local asymptotic linearity of the split-sample substitu- 
tion estimator k„^2 from (2.9) holds with influence function 

i:{x- 9, G) = h{te{x)) - K{Pe,G) + c{e)Mx; 0, G). 

Note that the sample variance is a special case with h{x) = x^, tQ{x) = x — 6 
and c{e) = 0. 

In Section 6 we shall consider the most important special case of this ex- 
ample, the linear regression model, in more detail in the context of efficient 
estimation of (functionals of the) error distribution. Here we consider the 
linear regression model with standardized errors. Substitution of, for exam- 
ple, the least squares estimators would lead to the empirical distribution of 
the standardized residuals as a natural estimator of G. See, for example, 
Koul (1992, 2002) or Loynes (1980) for early studies of the empirical distri- 
bution function of regression residuals. With k{P) = k{G) = J hdG = Eh{e) 
for appropriate functions /iiM— >]R, Theorems 2.1 and 2.2 hold with 

V'«(x; e,G) = h (^ ^~"~^^ ^ - JhdG 



and 



c{e) = --{Eh'{e),Eh'{e)EZ, Eeh'{e)f. 
a 
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For h{e) = we obtain an estimator of the skewness of the error distribu- 
tion, whose asymptotic normahty has been studied by Pierce (1982) under 
normahty (see also Remark 2.4). 

Example 3.3 (Distribution function in two-sample location model). Xi, . . 
are i.i.d. copies oi X = (Y, Z), where Y and Z — 9 are i.i.d. with density g, 
I y'^Qiv) dy < oo. The Banach parameter of interest is the distribution func- 
tion G(-) = Ji^ g{y) dy. Given the shift parameter 9, it can be estimated by 
the hnear estimator 

(3.6) Ge,n{y) = ^jZ^\Y,<y] + \z,~e<y]), 

i=l 

Since 9n = Zn — Yn is a hnear estimator of 9, the substitution estimator 
Gg „(■) is locahy asymptoticahy linear by Theorem 2.2. A structure similar 

to the one in Example 3.2 may be described by the map tg-.X ^M? with 

teix) = iy,z-9y, y,z£R. 

In fact, 9 can be estimated adaptively in Example 3.3, that is, efficiently 
within this semiparametric model; see van Eeden (1970) for an early con- 
struction valid for the class of strongly unimodal densities g and Beran 
(1974) and Stone (1975) for the most general situation. If we apply such an 
asymptotically efficient estimator 9n, then the resulting estimator Gg ^(•) 
is asymptotically efficient too, since (3.6) is efficient given 9. This hereditary 
property of asymptotic efficiency for substitution estimators follows from the 
heredity for linearity, which will be shown in Section 5 and is the main result 
of the present paper. As preparation we study efficient influence functions 
in the next section. 

4. Efficient influence functions. Let TChe a, Hilbert space. A one-dimen- 
sional subset {hr^ £ 7i : —1 < rj < 1} of TC is called a path if the map r/ 1— > /i^ is 
continuously Frechet differentiable with nonvanishing derivative, implying, 
for example, the existence of an /i G /i 7^ 0, with 

(4.1) hr, = ho + rjh + o{r]) uiH 
as 7] ^0. 

Let V he a, statistical model, that is, a collection of probability distribu- 
tions, and fix P G P. A subset Vp = {Prj : —1 < r/ < 1} of P is called a path 
through P if Pq equals P and Vp is a regular one-dimensional paramet- 
ric submodel in the sense of Definition 2.2. This implies the existence of a 

so-called tangent t G L2{P), t j^O, such that with = dP^/d^ for some 
dominating cr-finite measure and with s = sq, 

(4.2) Sr^ = s+\'qts + o{'q) 
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holds in L2{^). Note that, in contrast to the definition in Bickel, Klaassen, Ritov and Wellner 
(1993), the dominating measure ji may depend on the particular path and 
that hence we do not have to assume that our model V is dominated. Tak- 
ing squares in (4.2) and integrating with respect to /i we obtain J tdP = 0, 
which we denote by t G L^(P). 

Let Cp be a collection of paths Vp in V through P. By the tangent set 
we denote the set of all tangents t generated by paths in Cp. The closed 
linear span [V^] of is called the tangent space in "P at P generated by 
the collection Cp of paths Vp. This tangent space is denoted by VcL'^iP). 

Let ;S be a Banach space with norm || • ||g and consider a map v from V 
to B. We shall call v.V ^ B pathwise differentiable at P with respect to Cp 
if there exists a continuous, linear map v'.V ^ B such that for every path 
Vp = {Prj:\r]\ <1} in Cp passing through P with tangent t, 

(4.3) MPr,)-v{P)-r,0{t)h = o{r,). 

Following Section 2 of van der Vaart (1991) and Section 5.2 of Bickel, Klaassen, Ritov and Wellner 

(1993), we define the efficient influence functions z>fe* of v as follows: for h* 

in the dual space B* of B (the space of all bounded linear functions from B 

to M), the map b* o z/rP — > R is linear and bounded. Hence, by the Riesz 

representation theorem there exists a unique element z>6* £ V such that for 

every t &V, 

b* o 0{t) = (z>b.,t) = EOb*t. 

Here (•,•) denotes the inner product in L2{P) and E denotes expectation 
with respect to P. Note that this definition of efficient influence function 
depends on V and hence on the choice of Cp. 

From now on we take P to be a semiparametric model V = {Pe,G ■ ^ £ 
e, G G g], as in (1.1), with G C M'' open, and Q CH, where H is a Hilbert 
space. Fix G and let Cg be a collection of paths in Q C7i through G. 
By G we denote the tangent space in ^ at G generated by Cq, that is, 
the closed linear span of tangents at G along a path in Cg- We focus on 
estimation of Banach-valued parameters of the form = y{Pe^G) = '^(G), 
where D:G ^ B is pathwise differentiable; that is, there exists a bounded 
linear operator D:Q ^ B such that for all paths {G^ : < 1} G Cg with 
tangent G [cf. (4.1)], 

(4.4) \\D{G^)-D{G)-iiD{G)\\b = o{ii). 

Again, for every b* G B* , the map b* o i):Q is linear and continuous 
and hence there exists a unique Dh* £ Q such that for every G £ Q 

(4.5) b*ob{G) = {b^,,G)n- 

The elements for b* G B* are called the gradients of z>; they are similar 
to the efficient influence functions of described earlier. 
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If the parametric submodel Vi =Vi{G) of our semiparametric model V 
is regular in the sense of Definition 2.2, its tangent space Vi is defined to 
be the closed linear span [/i] of the fc-vector of score functions li = li{9, G). 
This agrees with the definition of tangent spaces in arbitrary statistical 
models [cf. (4.2)] by several choices of a collection Cg of paths, for example, 
Ce = {{Pg+ne,,G G ViiG):-! <ri<l}:i = l,...,k} with a, i = I, . . . ,k, unit 
vectors. 

By keeping 9 fixed and by varying G we get another submodel V2 = 7^2 (^)- 
Given a collection Co of paths within V2{9), the tangent space V2 at Pq^g 
is defined as the closed linear span in L2{Pe,G) of all functions r G L2{Pe,G) 
such that 

(4.6) sie, Gr,) = s{e, G) + ^r]Ts{e, G) + o{rj), 

in L2{n), for some path {G^ : |r/| < 1} G C^. Note again that V2 depends on 
the choice of Co- We assume that Cq is chosen in such a way that for every 
path {Prj = Pg+riCG,, ■ {"hI < 1} with {Grj : Iryl < 1} G Cq, there exists a tangent 
pe L^{Pe,G) satisfying 

(4.7) s{e + r,C, Gr,) = s{e, G) + \iips{e, G) + o(r?), 

in L2{p). The tangent space V at Pq g is the closed linear span in ^2(^0,0) 
of all these tangents p G L^iPe^G)- Typically, we have V = [h] + V2- 

In fact, we will assume that the tangents from (4.7) have a special but 
frequently occurring structure, namely that of Hellinger differentiability. 

Definition 4.1. For every G G and G £Q, the model V is Hellinger 
differentiable at Pe,G if there exists a bounded linear operator / : M'^' x ^ — > 
L2{Pe,G) such that for every G M'^ and every path {Grj : |??| < 1} G Cg with 
tangent G G Q, 

(4.8) s{e + vC, Gr,) = 8(9, G) + ir?(/(C, G))s{e, G) + o{r,), 
in L2{p). 

The operator I is called the score operator. It may be expressed in terms 
of the score function li for 9 in Vi{G) and the so-called score operator I2 
for G in V2{9) as follows. For C G M'' and G G G, we have 

(4.9) /(C,G) = /7C + ^2(G). 

Note that (4.8) and (4.9) reduce to (3.1) in the case where {Gn '■ \ri\ < 1} = 
{G} is a singleton, and to (4.6) in case C = 0- 

In the following proposition we collect some fundamental results on the ef- 
ficient influence functions for estimating 9 in V and for estimating ^{Pe^G) = 
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y{G)^ both in the submodel V2{0) and in the full model V. The efficient in- 
fluence function for estimating in Vi{G) is not of immediate interest for 
our purposes and hence is not discussed here. Define the efficient score func- 
tion l\ for estimating 9 in the full model V by 

(4.10) l\=h-Ii{h\V2). 

The efficient information matrix at Pe,G foi' estimating in is defined as 

(4.11) h{e) = E{iiif). 

Define the information operator as / J/2 '-Q and let (/J/2)~ck be a solution 
h^Q oi l^l2h = a, for a^Q. Let N{A) and R{A) denote the null space and 
the range of an operator A. 

Proposition 4.1. Consider a map u:V given by v{Pe,G) = ^{G). 
Fix 9, G and Cg, let Vi{Q) be a regular parametric model as in Defini- 
tion 2.2, and let V be Hellinger differentiable as in Definition 4.1. If: 

(i) is a closed and linear sub space ofTL, that is, Q = Qg = , 

(ii) v:Q is pathwise differentiable at G, as in (4.4), 

(iii) I^{9) from (4.11) is nonsingular, 

then 

A. The efficient influence function at Pq^g for estimating 9 in V is given 

by 

(4.12) !^ = I-\9)ll 

B. The map v:'P2{9) B is pathwise differentiable at Pq q if o^rid only 

if 

(4.13) i>b'eR{i^) yb*eB*. 

The efficient influence functions of v are related to the gradients of v by 

(4.14) i>fe. = /Jt'fe., ^Vi^^b* ^Q- 
If also 

(4.15) Db* G R{ili2) for all b* e B* , 
then the unique solution of (4.14) is given by 

(4.16) Ub* = hi^ hy i^b* ■ 
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C. The map v.V ^ B is pathwise dijjerentiable at Pq^g if ^^^^ only 
if (4.13) holds. The efficient influence functions of v are related to the gra- 
dients of V by 

0= {ii,Ob*)e, 

(4.17) 

i^b* = h ^b' ■ 

If also Pfc* € R^ljh), for all b* G then the unique solution of (4.17) is 
given by 

(4.18) o,*=i2{iJi2)-i^b' - {i2{ili2)-H'M)lK^il. 

Parts B and C of this proposition are due to van der Vaart (1991); see his 
Theorem 3.1, formula (3.10) and Corollary 6.2. The gist of formula (4.18) is 
already contained in Begun, Hall, Huang and Wellner (1983), (4.4) and (3.1). 
Proofs of the proposition may be found also in Bickel, Klaassen, Ritov and Wellner 
(1993); see their Corollary 3.4.1, Theorem 5.4.1 and Corollaries 5.4.2 and 5.5.2. 

Note however, that they need the conditions 7^2 = -^(^2) and V = R{1). This 
is caused by their definition of tangent space V as the closed linear span 
in L^{Po^g) of all possible tangents p G L2{Pe,G) that may be obtained 
via some path {P^:|?7| < 1,P^ G V}. In any particular model, the goal is 
construction of efficient estimators. The convolution theorem implies that 
if efficient estimators exist, they are asymptotically linear in the efficient 
influence functions; see Theorem 2.1 of van der Vaart (1991) and Theo- 
rems 3.3.2, 5.2.1 and 5.2.2 of Bickel, Klaassen, Ritov and Wellner (1993). In 
principle, the variances of the efficient influence functions corresponding to 
Bickel, Klaassen, Ritov and Wellner (1993) equal at least those correspond- 
ing to van der Vaart (1991), and should they differ, efficient estimators in 
the sense of van der Vaart (1991) do not exist. However, in practice esti- 
mators can be constructed that are efficient in this sense for appropriate 
choices of C, which implies that they have to be efficient in the sense of 
Bickel, Klaassen, Ritov and Wellner (1993) as well. Of course, the advan- 
tage of the present approach is that the extra conditions mentioned above 
need not be verified now. 

If also N{l2) = {0} and R{l2) is closed, then is one-to-one and onto, 
so {l2l2)~ may be replaced by (/J/2)~^- In this case all parameters ^{P) 
expressible as pathwise differentiable functions of G are pathwise differen- 
tiable; see Corollary 3.3 of van der Vaart (1991). 

5. Efficient estimation of Banach parameters. Let Xi, . . . , X„ be an i.i.d. 

sample from Pefi G "P, a semiparametric model as in (1.1). In this section 
we shall construct an efficient estimator of u{Pq^g) = t^{G) G B based on 
Xi, . . . ,X„ within the model V, using the constructions and the heredity of 
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asymptotic linearity as studied in Section 2. As described in Section 1, we 
start with an efficient estimator of i^(G) within the submodel 7^2 (^)i where 
6 is fixed and known and G varies in Q. An estimator of i'{Po^g) = t^{G) 
within V2{0) is of course allowed to depend on 6. Let ug^n be such a submodel 
estimator. In view of part B of Proposition 4.1 this estimator is efficient 
within the submodel V2{6) with respect to the chosen collection Cq of paths 
if it is asymptotically linear in the efficient influence function given in (4.16) 
with the score operator I2 = hi^jG) -.Q ^ L2{Pe,G) at {6,G) depending on 
6 and G. Note that I2 depends on Cq since Q does. We shall need this 
asymptotic linearity locally uniformly in 6, in the same way as in (3.3). 

Definition 5.1. Fix a subset Bq of B*, the dual space of the Banach 
space B. The submodel estimator (yg n is called BQ-weakly locally submodel 
efficient at Pe^fi if for every sequence {On} with {^/n{9n — ^0)} bounded, 
and every h* ^Bq, 



(5.1) y/n 



1 " 



h*{ue^,n-i^{G))--Y.i^{X,-en,G-b 



n 1- , 

1=1 







holds with 

(5.2) i;{x;9,G;b*) = [/2(0, G)(/J(0, G)/2(0, G))-£^fe.](x). 

The main result of our paper states that, under regularity conditions, 
if 9n = tn{Xi, . . . ,Xn) is an efficient estimator of 6 in V and if i'$^n = 
Un{Xi, . . . , Xn] 0) is a weakly locally submodel efficient estimator of v{Pq^g) = 
i'{G) at 9q, then the substitution estimator z)^ ^ is an efficient estimator of v 
at 9q in the semiparametric model V\ see the discussion in Section 4 after 
Proposition 4.1. If i/q ^ is sufficiently smooth in 9, this substitution estimator 
itself may be proved to be efficient; see Theorem 5.2 below. Without this 
extra condition we have to resort to a split-sample version of the substi- 
tution estimator, as in Section 2. Fix a sequence of integers {A„}^i such 
that (2.7) holds, and define 9ni and 9n2 as in (2.8). Analogously to (2.8) 
and (2.9), write 

(5.3) Dfl^ = UA„ (Xi , . . . , Xa„ ; 0), P^'t = n„_ a„ (Xa„+i , . . . , ^) 
and 

(r. A\ " ^-n- n- Xn _(2) 

(5.4) z/„ = — u- H u~ 

n S„2,^n fl 9nl,n — \n 

To prove efficiency of this estimator at ^0 £ © we will need the following 
smoothness condition, which is similar to (2.6). For every Q and every 
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sequence {On} with {\/n{9n — Oq)} bounded, 



(5.5) 

and 
(5.6) 

hold with 
(5.7) 
and 
(5.8) 



i=l 



i=l 



M^; 0, G) = He, G)ilJ{9, G)h{9, G))-h{x 



Ch{9) = -Ee{MXi;e,G)iJi9)iXi)). 
Furthermore, we write [cf. (5.2)] 

(5.9) c{9, G- b*) = -Ee{^{Xr,e, G; b*)ij {9){Xi)). 

Lemmas 2.1 and 2.2 might be useful in checking conditions (5.5) and (5.6). 
Our main result is efficiency of P„ as follows. 

Theorem 5.1. Fix 9o £ Q and Bq C B* . Suppose that (5.5), (5.6) and 
the conditions of Proposition 4.1 are satisfied in model (1.1) for appropri- 
ately chosen collections Cq of paths. Suppose that the submodel estimator 
UQ^n is Bq- weakly locally submodel efficient as in (5.1) and that (4.15) holds 
at Pe,G- If efficient estimation of 9 is possible within V and if 9n is an ef- 
ficient estimator of 9 in V, then z>.„ defined by (2.7), (2.8), (5.3) and (5.4) 
is a BQ-weakly efficient estimator of v from (1.2) within the full model V 
at Poq^g; is, for every sequence {9n} with {^/n{9n — 9q)} bounded and 
every 'b*eB*Q [cf (2.10), (4.18), (5.2) and (5.9)], 



(5.10) 



b*{i)n-u{Pe„,G)) 



~ - Y,mXi;en,G;b*) + c{9n,G;b*)I~\9n)lU9n){Xi)] 



n 



i=l 



0. 



Proof. For every b* € Bq, Theorem 2.1 may be applied and the local 
asymptotic linearity in (5.10) may be seen to yield efficiency via Proposi- 
tion 4.1. C, (5.2) and (5.9). □ 
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A closer look at the proofs of Theorems 5.1 and 2.1 with k!{9) = reveals 
that if the orthogonality 

(5.11) [/i(0o)]^^2(0o) 

holds, then c{9q,G; b*) and the last term at the left-hand sides of (2.12) and (2.14) 
vanish, as does the second term at the right-hand side of (4.18). Hence it 
suffices for On to be i/n-consistent at and we do not need (5.6) and (2.7), 
but instead 

(5.12) 0<liminf— <limsup— <1. 
We formulate this special case as a corollary. 



Corollary 5.1 (Adaptive case). Fix 6*0 G 9 and Bq C B* . Suppose that 
the conditions of Proposition 4.1 are satisfied in model (1.1) for appropri- 
ately chosen collections Cq of paths and that for all sequences {On} with 
{y/n{0n — ^o)} bounded, 



(5.13) 



1 



n -j n 

J2MXi;0n,G) - ^J2MX^■,0o,G) 



0. 



Suppose furthermore that vg^n is BQ-weakly locally submodel efficient as in (5.1) 
and that (4.15) holds at Peo,G- If(^n is a y/n- consistent estimator at 0q and if 
the orthogonality (5.11) holds, then z>„ defined by (5.4) and (5.12) is a weakly 
efficient estimator of u from (1.2) within the full model V at Pea,G; that is, 
for every sequence {On} with {\/n{On — Oq)} bounded and every b* € Bq, 



(5.14) 



1 



b*ii>n - i^iPe^,G)) - - E ^(XuOn, G; b* 



n 



0. 



Remark 5.1. Our main result, Theorem 5.1, states that, assuming suf- 
ficient regularity of a semiparametric model V, two conditions, namely ef- 
ficiency of an estimator On of the finite-dimensional parameter in the full 
model V, and submodel efficiency of an estimator VQ^n of a functional of 
the infinite-dimensional parameter u in the submodel 7^2 (^) with fixed, 
are sufficient to guarantee efficiency of the combined estimator z)„ = ,^ 
in V. The result derives from general expressions for the influence functions 
of substitution estimators of Section 2. These expressions can be used to 
pinpoint what is needed in terms of efficiency or what is allowed in terms of 
deviations from efficiency of the separate estimators On and ug^n to achieve 
efficiency of the substitution estimator Un- Here we will derive conditions 
heuristically. Let On and vg^n be asymptotically linear estimators with in- 
fluence functions V'l and ^^2) respectively. Without loss of generality, they 
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may be written as "01 = ^1 + ^2 = -^22^^2 + ^2, with Ai _L [/i,/2] 

and A2 -L h', see Proposition 3.3.1 of Bickel, Klaassen, Ritov and Wellner 
(1993). Then by Theorem 2.1 and (2.20) and (2.10) the influence function 
of the substitution estimator is given by 

1P2 + CV^I = 122^2 + A2 - {£{{1^2% + A2)/7))(/~l + Ai) 

= q^H2 - l22hlll + A2 - l22hl^l - {E{A2iJ)){h + Ai), 

which equals the efficient influence function I2 = ^22^/2 — -^22^-^21^1 if and only 
if 

(5.15) A2 = 12^2' ^2iAi + (^(A2/7))(ri + Ai) 

holds. If 9n is efficient, that is, if Ai = 0, (5.15) shows that we need A2 = 
b^h; so deviations from efficiency of ue^n are permitted, provided they are 
in [li], that is, provided these deviations are matrix multiples of 9n — 0. An 
example of this phenomenon is given in (6.26) in Example 6.4. If z>e^„ is 
efficient, that is, if A2 vanishes, (5.15) reduces to I22 -^2iAi = 0. This means 
that in the adaptive case (/21 = 0), On need not be efficient (see Corollary 5.1) 
and that in the nonadaptive case On has to be efficient in order to obtain 
efficiency of z)„. Of course, also combinations of estimators are possible where 
neither of them is efficient, but in this case only a lucky shot might yield an 
efficient combined estimator 



Remark 5.2. The first occurrences of the terminology "adaptive esti- 
mators" are in Beran (1974) and Stone (1975). In Pfanzagl and Wefelmeyer 
[(1982), pages 14 and 15], it is argued that this terminology is rather un- 
fortunate since "adaptiveness" is a property of the model, namely (5.11) 
holds, and not of the estimators, which are just semiparametrically efficient, 
van Eeden (1970), who was the first to construct partially adaptive estima- 
tors of location in the one- and two-sample problem, calls her estimators 
efficiency-robust. Since the terminology of adaptiveness is quite common 
nowadays, we will stick to it, although Pfanzagl and Wefelmeyer (1982) are 
right, and we will call Un of Corollary 5.1 an adaptive estimator of the Ba- 
nach parameter y{P). 



Remark 5.3. In the adaptive situation of the corollary the direct substi- 
tution estimator can also be shown to be efficient in the sense of (5.14) 

if On takes its values in a grid on Mf' with meshwidth of the order 0(n~^/^). 
This is the classical discretization technique of Le Cam (1956), which has 
also been used in our proof of Theorem 2.2. 
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The next theorem states that the direct substitution estimator is 
efficient in the general semiparametric model, if vg^n is sufficiently smooth 
in e. 

Theorem 5.2. Under the conditions of Theorem 5.1, let On he an effi- 
cient estimator of 0; in the adaptive situation of (5.11) it suffices that On 
he ^/n- consistent. Fix b* £ Bq. If for all 6 > 0, e > and c> 0, there exist 
C > and no G N such that for all n>nQ, 



(5.16) Pe,,G[ sup ^/^|6*(i>e,n-^>e„)|>e <<5 

holds, then the substitution estimator b*pQ ^ is an efficient estimator ofb*^ 
with V from (1.2) within the full model V at Peo,Gi that is, it satisfies (5.10). 

Proof. Note that (5.16) is a translation of (2.15) and apply Theo- 
rem 2.2. □ 

Remark 5.4. In the special case where Q may be identified with a subset 
of Euclidean space. Theorem 5.1, Corollary 5.1 and Theorem 5.2 also apply. 
Here we give a heuristic argument why these results might be true in the 
Euclidean and hence the general case. Let e C M'' and W C and let 

be a regular {k + /)-dimensional parametric model in the sense of Defini- 
tion 2.2. We have identified Q with 7i and hence we have Q = 'M}, provided 
the class C of allowed paths is large enough. Define 

d ■ d 

h{0,r]) = —\ogp{x-0,vi) and hiO,?]) = —logp{x;0,r]) 

as the score functions for and rj, respectively, and the Fisher information 
matrix by 

fhi{0,r]) h2{0,r]y 



1(9,7^) 



with lij{0,r]) = ElilJ {0 , T]) . Regularity of "P implies that l22{9,r]) and I{9,r]) 
are nonsingular. The efficient score function for estimating is given by 

ll{0,r,) = h{O,ri)-h2l22i2{9,v), 

and with 

hi0,7]) =Elllf [0,7]), 

the efficient influence function for estimating is given by (cf. Proposi- 
tion 4.1.A) 

h{0,v) = I:\O,vri{O,r,). 
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We are interested in estimation of i'{9, rf) = 9(7]) within V. Let : — > M™ 
be difFerentiable with (m x I) partial derivative matrix i>. Now u {ifj 1 22 ^2{Q-, v) 
is the efficient influence function for estimating D{r]) in the submodel 7^2 (^)- 
This coincides with formula (4.16) of Proposition 4.I.B. Note that the op- 
erator ^2^1^'^ L2iPd,ri) is represented by the column /-vector I2 via 



(5.17) 



/2(a) = a^l 



that the operator {^h) '-^^ 
matrix 122^(0,77), and that 



2, a£. 
is represented by the nonsingular (/ x /)- 



b* £ 



(5.18) i>b*=i> (r?)6* gM^ 

According to formula (4.18) of Proposition 4.1.C the efficient influence func- 
tion li, for estimating D{r]) in the full model V is given by 

(5.19) Ue,ri) = i){r,)l22Hi2{0,v)-l2ihiO,v))- 

Fix ^0 and suppose that 1)0 n is a (weakly) locally submodel efficient estima- 
tor of D{ri) within 7^2(^0)) that is, 



(5.20) 



1 " 

-Y^y{ri)l2.^h{Bn.ri){Xi) 



n 1- , 



0. 



Substituting an estimator of Q for ^„ with influence function ^ under ^0 
and using Taylor's expansion and the weak law of large numbers, we can 
formally argue as follows: 



y) 



-^Y.^{r,)i2^h{en,'n){x, 



i=l 

-I n 1- o 

Hv)l2ihido,v)iXi) + P(r?)— /22'/2(eo,r?)(X,)(^„ - ^o) 



i=l 

-. n a 

i=l 



1 " 

y=J2Hv)i22%ieo,vm) 



1 " 

^E^(^)^22'^2(0o,r/)(X, 



1 " 
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By partial integration we have, under regularity conditions, 

d f ■ f ■ d 

= 00 J l22h{do,ri)p{9o,v)dfi- J l22h{0o,ri)—p{eQ,ri)dn 

= - j l22hilp{0o, V) dfJ- = -/22^/2l(6'o, ?])• 

This means that the influence function of z)^ ^ equals 
(5.21) £;(r?)/22'fe(^o,??) -/2i(^o,r?)V'), 

which corresponds to luiOo^v) from (5.19) if ^ = li, that is, if On is efficient 
in V. 

The regularity of V and in particular continuity and nonsingularity of 
l22{0,il) imply (2.25), and hence Lemma 2.3 yields (5.5). Consequently, by 
Theorem 5.1 a split-sample modification of is an efficient estimator of 
z^, if (5.20) is valid. By arguments as in Gong and Samaniego (1981) and 
under their extra regularity conditions it may be verified that the submodel 
maximum likelihood estimator VQ^n given d satisfies both (5.20) and (5.16). 
Then Theorem 5.2 shows that Vf, is efficient if Ori is. Gong and Samaniego 
(1981) prove this directly and they call z/^ ^ a pseudo maximum likelihood 
estimator. 

6. Examples. In this section we shall present a number of examples that 
illustrate our main results, namely Theorem 5.1, Corollary 5.1 and Theo- 
rem 5.2. The first example expands on Example 3.1. The next examples are 
important semiparametric test-cases well known from textbooks; our results 
should in any case be applicable for those examples. Example 6.2 treats lin- 
ear regression, which was used in Section 1 for motivation, for the particular 
case of a symmetric error distribution. For a possibly asymmetric error dis- 
tribution we study the location problem in Example 6.4. These statistical 
models are parametrized linkage models, which are discussed in Example 6.3. 
A recurring theme in these examples is the idea that estimators based on 
residuals are actually estimators based on the unobservable errors, with the 
unknown parameter needed to construct these errors replaced by suitable 
estimators; see also Example 3.2. Example 6.5 considers the bootstrap, and 
we conclude in Example 6.6 with another well-known semiparametric model: 
the Cox proportional hazards model. 
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Example 6.1 (Efficiency of sample variance). In Example 3.1 we have 
shown the local asymptotic linearity of the sample variance in the class of 
all distributions with finite fourth moment. At any point P of this model V 
the tangent space is maximal and equals V = L2iP), provided the col- 
lection Cp of paths in V is chosen sufficiently large; see Example 3.2.1 
of Bickel, Klaassen, Ritov and Wellner (1993) for an explicit construction, 
which is also valid in our more general framework. Consequently, any lo- 
cally asymptotically linear estimator of the variance is efficient; see Theo- 
rem 3.3.1 of Bickel, Klaassen, Ritov and Wellner (1993). In particular, the 
sample variance is efficient. Of course, this conclusion can also be drawn 
from Theorem 5.2, since ^7=ii-^i ~ efficient within V2{P) and Xn 
within V for the same reasons of linearity and maximal tangent spaces. This 
line of argument may be used to show efficiency of all sample central mo- 
ments and, more generally still, for all functions h with Y17=i h{Xi — Xn) 
estimating u[G) = J h dG within an appropriately broad class of distribution 
functions G. 

Example 6.2 (Symmetric error distribution in linear regression). Sup- 
pose we observe realizations of Xi = (Yi,Zi), i = l,...,n, which are i.i.d. 
copies of X = (y, Z) . The random A;-vector Z and the random variable Y 
are related by 

(6.1) Y = e^Z + e, 

where e is independent of Z and symmetrically distributed about with 
unknown distribution function G and density g with respect to Lebesgue 
measure A. For deriving lower bounds we assume that Z has known distri- 
bution F and that EZZ^ is nonsingular. Note that the unknown Euclidean 
parameter G M'^ is identifiable via 

(6.2) Q = {EZZ~^Y'^E{^Zm{Y\Z)), 

where m{Y\Z) denotes the median of the conditional symmetric distribu- 
tion of Y given Z. We are interested in estimating the symmetric error 
distribution v^Pg^c) = ^{G) = G. 

The density of X with respect to A x F is given by 

(6.3) p(x; e, G) = p{y, z; 6, G) = g{y - O'^z). 

We assume that G has finite Fisher information I{G) = J{g'/g)'^gdX for 
location, and hence we have 

(6.4) h(e)(x)= h(x-e,G) = -z^(Y -e'^z) = -z^(€) 

9 9 

and 

(6.5) a=|G'GLoo(A):g>0,y' gd\ = l, g{-.)=g{.), I{G) <oo 
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We embed G into TC = L2{\) by taking square roots of densities. The Fisher 
information /(•) for location is lower semicontinuous on Q. Therefore, we 
will restrict Cq to those paths on which /(•) is continuous. Such paths 
may be constructed in the same way as at the end of Example 3.2.1 of 
Bickel, Klaassen, Ritov and Wellner (1993). Then we have, embedding Q 
into Ll{G), 

a° = {/i G Ll{G) : h{-) = h{-),h' G 4{G)}, 

(6.6) 

g = {he4{G): h{-) = h{-)}. 
Note that hiO) is the embedding of Q into L2{Pe,G) given by 

(6.7) h^h{Y -e'^Z), 

whence l2{0)iG) = V2- The finiteness and positivity of the Fisher informa- 
tion I{G), the nonsingularity of EZZ~^ , the choice of Cq, the L2-continuity 
theorem for translations and (6.7) ensure regularity and Hellinger differen- 
tiability as described in Definitions 2.2 and 4.1, respectively. Furthermore, 
the symmetry h£Q and antisymmetry of h{9) imply 

(6.8) h{d)Li2{6)h. 

Thus we are in an adaptive situation here. 

The map v:Q ^ the cadlag functions on [—00,00] with sup-norm, is 
pathwise differentiable at G G ^ with derivative [cf. Example 5.3.3, page 193, 
of Bickel, Klaassen, Ritov and Wellner (1993)] 

(6.9) b{h){t) = J (i(l(_oo,t](^) + l(-oo,t](-x)) - Git))hix) dG{x). 

Note that (6.6), (6.8) and (6.9) imply that the conditions of Proposition 4.1 
are satisfied. Furthermore, the L2-continuity theorem for translations im- 
plies (2.25). Consequently, Lemma 2.3 shows the validity of (5.13). Finally, 
note that (4.15) holds since RilJ = Q and v^* E ^ by definition. 

With 9 known, an efficient estimator of G is the symmetrized empirical 
distribution function of ei, . . . , e^, given by 

(6.10) Ge,n{x) = \{Ge,n{x) + Ge,„(x)), 
where 

n n 

(6.11) Ge,n{x) = n-^ ^ 1[,,<^] = n'^ ^ '^[Yi-e'^ Zi<x\ 

i=l i=l 

and 



(6.12) 



Ge,n{x) = 1 - lim Ge,n{-y) 
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[cf. Example 5.3.3, pages 193-195 of Bickel, Klaassen, Ritov and Wellner 
(1993)]. We note that Ge^n is weakly locally submodel efficient, since it 
is exactly linear in the efficient influence function; see just above (5.3.10), 
page 194 of Bickel, Klaassen, Ritov and Wellner (1993). Finally, by a method 
of Scholz (1971) we know that the maximum likelihood estimator of 6 corre- 
sponding to the logistic density exists under any density within our model. 
Furthermore, this pseudo maximum likelihood estimator is -y/n-consistent 
[cf. Example 7.8.2, page 401, of Bickel, Klaassen, Ritov and Wellner (1993)]. 
In fact, efficient and hence adaptive estimators of the regression parameter 9 
have been constructed, for example, by Dionne (1981), Bickel (1982) and 
Koul and Susarla (1983). 

Consequently, by Corollary 5.1 the split-sample estimator defined by (5.4), 
(6.10)-(6.12) and (5.12) is efficient. Note that this efficient estimator does 
not use any knowledge about the distribution of Z and hence is also adaptive 
with respect to the distribution of Z. 

Clearly, in practice one would not apply sample splitting, but use Gg ,^ 
itself, which is the symmetrized empirical distribution function based on 
the residuals ii = Yi — 6^ Zi. This yields an efficient estimator of G if On 
is discretized as described in Remark 5.3. Without discretization Gn „ is 



weakly efficient in the sense of Theorem 5.2 for most b* £ B* , including 
the evaluation maps. To see this it suffices to verify (5.16) for empirical 
distributions of regression residuals, as is done in the following lemma. 

Lemma 6.1. In the regression model (6.1) let the error have bounded 
density g (not necessarily symmetric) and let E\Z\ be finite. Let b* £ B* 
be such that there exists a finite signed measure fi with b*{b) = J b{x) d^{x) 
and \\b*\\ = |/i|([— oo,oo]) < oo. For such b* , the smoothness condition (5.16) 
holds for 



Proof. Let b* be given and let /i be the corresponding signed measure 
with \\b*\\ = G. Let the density g of be bounded by B and assume E\Z\ = 
A. For fj and A„ — > oo, Xn/\/n^ 0, Markov's inequality yields (note 
e^ — 1 < 2z for < z sufficiently small) 




n 



(6.13) 



n 



X]^K-eTZ,,oo)(-)- 



i=l 



P 
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: exp|nlog|^l + -B|^exp|^— ^ 



(6-14) X J l[\e^+i^T z^-r,\<2c\z^\/V^]d\fJ^\ix)^ -1 



<exp|2A„V^ j P{\ei+f Zi-x\ < - eA„| 

< exp{(8A5CC - e)\n] as n ^ oo. 
Since Y - 6'^ Z <x<Y - implies \Y - O'^ Z - x\<\e - e\\Z\, we 



obtain 



sup 

v^|e-eo|<c,v^|6»-e|<c 

(6.15) 



f 1 
/■ 1 " 

- ^""P / -7^^M\e.+v'^z,-x\<c\z,\/v^]d\fi\{x). 



Consider the grid with meshwidth 2{kn)~^^'^(^ with A; the dimension of 9. 
By (6.15) and (6.14) the probabihty in (5.16) may be bounded by 

Poo,g( sup f ^J2^e^+f)'^z^-x\<2(;\z,\/VT^]d\^^\{x)>e] 
\V^\fi\<cHM5c' i=i J 

(6.16) < J2 exp{{8ABCC-e)Xn} 
v^l^l<c+C/'y6Gc 

<(^^ + 1^ exp{(8ASCC - e)Xn}, 
which converges to if 8ABCC < e holds. □ 
We have proved the following result. 

Proposition 6.1. Consider the linear regression model (6.1) with the 
covariate vector Z and the error e independent, both with unknown distri- 
butions. The matrix EZZ~^ is nonsingular and the error distribution G is 
assumed to be symmetric about zero with finite Fisher information for loca- 
tion. There exist \/n- consistent and even adaptive estimators of 9. For any 
such estimator, any estimator of G defined by (5.4), (6. 10) -(6. 12) and (5.12) 
is weakly efficient in the sense of (5.14), that is, asymptotically linear in the 
efficient influence function given in (6.9). Furthermore, the direct substitu- 
tion estimator Gg ^ is weakly efficient for all b* £ B* as in Lemma 6.1. 
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Remark 6.1. Note that with k = 1 and Z degenerate at 1, this propo- 
sition yields an efficient estimator of the error distribution in the classical 
symmetric location problem. The ordinary location problem will be treated 
in Example 6.4. 

Remark 6.2. The idea of using the residuals to assess the error distri- 
bution is quite standard and has been around for a long time, for instance 
in testing for normality. 

Remark 6.3. Interest might be in the standardized symmetric error 
distribution, that is, in G standardized to have unit variance, as in Exam- 
ple 3.2. This leads to a nonadaptive situation in which approaches as in the 
next example should lead to efficient estimators. 

Example 6.3 (Parametrized linkage models). As in Example 3.2 we 
consider the statistical model of n i.i.d. copies of a random variable X that 
is linked to an error variable e with distribution function G via 



with tg-.X measurable and 9 £ C M^'. Let 9 be given. The empirical 
distribution function of tg{Xi), i = 1, . . . ,n, is (asymptotically) linear in the 
influence function 



This influence function and hence the empirical distribution function itself 
are efficient in estimating the distribution function G if G and Q are unre- 
stricted. 

Typically, however, G is constrained to be symmetric (as in the preceding 
example) or to have, for example, mean 0. In general, if the constraints can 
be described by 



for some fixed measurable function 7 : M ^ M , then the efficient influence 
function in estimating G may be obtained from (6.17) by projection [cf., 
e.g., (6.2.6) of Bickel, Klaassen, Ritov and Wellner (1993)] and equals 



(6.18) X ^ l[i^(,)<.] - G(-) - ii;(l[,<.]7^(e)){i^7(e)7^(e)}~S(i(?(^))- 



te{X) = e, 



(6.17) 





Under appropriate regularity conditions. 




i=l 
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(6-19) - (^lt.hoix.)<t]l^ MX,))^ 

r , n 1 1 " 

I " i=i ) i=i 

is an efficient estimator of t € M, within this restricted class Q of 

constrained distribution functions, given Q. Subsequently, a weakly efficient 
estimator of G within the semiparametric model with unknown may be 
obtained via the theorems of Section 5. 

We will present the details of this approach for the particular case of 
/c = 1, Z = 1 a.s., that is, for the location model, in the next example. 

Example 6.4 (Error distribution in location problem). Let Xi, . . . ,Xn 
be i.i.d. random variables, which are copies of a random variable X with un- 
known distribution P gV and distribution function F on M. It is well known 
that the empirical distribution function Fn is efficient in estimating F, when 
F is completely unknown. Let us assume now that the Xj have finite vari- 
ance and mean 6. It is well known also that the sample mean X„ is efficient 
in estimating 9. With tg{X) = X — 6 = e, the error distribution function 
G gG, the class of all distribution functions with mean zero, satisfies 

G{t) = F{t + 9), tGM. 

Given F„ and X„, a natural estimator of the unknown error distribution G, 
which has mean zero, would be 

(6.20) Gn{t)=Fn{t + Xn), teR. 

In fact, Gn is an asymptotically efficient estimator of the error distribution 
function G, as may be shown by computation of the efficient influence func- 
tion along the lines of Example 5.3.8 of Bickel, Klaassen, Ritov and Wellner 
(1993). Let ^' be a collection of bounded functions ■0 : M ^ M with bounded 
uniformly continuous derivative ■0' and let v map V into the Banach space 
of bounded functions on ^ with the supremum norm such that 

(6.21) HP9,Gm = HG)W = G(V) = J m dGit), V e VI/. 

Thus, G is identified via P(G) provided the class ^I* is rich enough. Indeed, 
Gn from (6.20) is efficient, that is, 

V^icniij) - G{^) - ^pJ{XiM)^ =op{l), Ve*, 
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holds with the efficient influence function I equal to [cf. (6.18)] 

l{xm = ^{x -6)- Ep^ ^^{X -9)- Ep, ^i;'{X - e)ix - 9), 

(6.22) 

If we apply the approach of Section 5, we need an efficient estimator of G 
for the case where 9 is known. As explained via the parametrized linkage 
models of Examples 3.2 and 6.3, the naive empirical distribution of Xi — 9, 
z = 1, . . . , n, is not efficient, but an explicit weighted empirical of the Xi — 9, 
i = 1, . . . , n, as given in the following proposition, is asymptotically efficient. 



Proposition 6.2 (Location known). Let Xi,...,Xn be i.i.d. random 
variables with known mean 9 and distribution function G{- — 9), where G is 
unknown with finite variance. The estimator 

(6.23) Ge,n{t) = ^ E{i - " }l(-oo,](^. - 9), iGM, 

with S'^{9) = n~^Yll=i{^i ~ ^)^; ^-^ weakly efficient in estimating the er- 
ror distribution function G with G identified via u -.Q ^ B = 1°^{L2{G)), 
v{G){il))= fi)dG for^P€L2{G). 

Proof. Without loss of generality we may take 9 = 0. For ip square 
integrable with respect to G we have 

Go,„(V)-G(V) 

I n r 1 " 1 

(0.24) = -J'P''C- ;^g*(x,)x,x,| 

i=l 



covg(V(^),X) 1 ( 1 

Xi) +Op 



n ^ [ J varc X J V 

where the last equality is implied by the law of large numbers. Consequently, 
Go,n is asymptotically linear in the efficient influence function as given in 
Example 6.2.1 of Bickel, Klaassen, Ritov and Wellner (1993); see (6.18). □ 

Remark 6.4. Note that Gg^n from (6.23) is just Ge,n from (6.19) for 
tQ{x) = X — 9, written appropriately. Ge.n is a signed measure; in its far tails 
it need not be monotone. 

Plugging in 9 = Xn we obtain 

1 " 

(6.25) Gx^^M = - E l{-oo,t] {X^ - Xn) = Fn{t + Xn) = Gn{t), 
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the estimator of G from (6.20) which has been proved efficient above for 
B = with a smaher set ^ than L2{G) as in Proposition 6.2. The sample 

sphtting and substitution technique of Theorem 5.1 yields a different though 
similar efficient estimator of the error distribution G. Applying Lemma 6.1 
and Theorem 5.2, we obtain the weak efficiency of G„ for another B and B* . 

Note that plugging in X„ for into the empirical distribution func- 
tion Fg^n{t) = ^ Er=i l(-oo,i] (-^i ~ ^) of the Xi — yields the same esti- 
mator Gn{t) of G{t). Although, as noted before, Fe^nit) is not an efficient 
estimator for 9 known, the combined estimator is. From Remark 5.1 we 
know that the substitution estimator can be efficient even if i>e „ is not 
efficient, as long as the influence function of vg^n satisfies (5.15). In this case, 
this translates to Fg^n{t) = Gg n{t) + b{Xn — 0) + Op{l), for every t G M and 
some 6 G M. This is indeed the case, since by (6.23) 



(6.26) 



X — 9 1 " 



• {E{X - 0)l(_oo,t](^ -0) + Op(l)), 



because of the law of large numbers. 

The empirical likelihood approach of Owen (1991) has been applied by 
Qin and Lawless (1994) in their Example 3 (continued), page 314, to obtain 
another implicitly defined efficient estimator Gg ^ of G. „ is a proper 

distribution function and substitution of 9 by Xn in G*q ^ yields Gn as well. 

Example 6.5 (Bootstrap). When constructing confidence intervals for 
the mean 9 using the sample mean Xn = n~^ Yl'i=i one needs the distri- 
bution of y/n{Xn — 9). It can be simulated once the distribution oi X — 9 = 
Xi — 9 is known. By the fundamental rule of thumb of statistics this dis- 
tribution X — 9 should be estimated when unknown. According to Ex- 
ample 6.4 an efficient estimator of this distribution is Gx^ „ = Fn{- + X^) 
from (6.25) and (6.20). In this way the distribution of X — ^ under F is 
estimated by the distribution of X*, say, under Fn{- + Xn), which equals 
the distribution of X* — Xn under Fn- Via this approach we see why in the 
bootstrap world the distribution X — 9 under F should be replaced by 
the distribution of X* — Xn under Fn ■ 

Example 6.6 (Baseline survival distribution in Cox's proportional haz- 
ards model). We observe i.i.d. copies oi X = {Z, T), where the hazard func- 
tion of an individual with covariate Z = z G M is given by 

X{t\z) = e^'X{t), 
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where 9 £M and A is the so-cahed basehne hazard function, corresponding 
to covariate z = 0, and related to the Banach parameter G as foUows: 

(0.27) ^ = 1^ = 1- 

Here g is the density corresponding to the distribution function G on [0, oo) 
of T, given Z = 0. Fix Tq > and define Q to be ah distribution functions 
G with G{Tq) < 1. We assume that the distribution of Z is known and 
has distribution function F. Furthermore, we denote Lebesgue measure on 
(0,oo) by fi and note that identification of G with ^ yields G C L2{iJ,). 
Then the density of {Z, T) with respect to ^ x F is 

(6.28) p(z,t;^,G) = e''^5(t)(l-G(t))(""P(^")"^\ 

As in Example 3.4.2 of Bickel, Klaassen, Ritov and Wellner (1993), it is not 
difficult to see that 

(6.29) h{z,t;0) = z{l-e'^'A{t)) 
with 

A(t) = A(.) ds = ds = - log(l - G{t)). 

Representing Q in L2{G), we get Q = = L^{G), and 12:0 ^ L2{Pe,g) is 
given by 

(6.30) ii2{0)a){z,t) = a{t) + (e^^ - 1) G^^^'^ ' 

It is well known [cf. Tsiatis (1981)] that if EZ"^ ex.p{29Z) is bounded uni- 
formly in a neighborhood of Oq, then the Cox (1972) partial likelihood esti- 
mator 9n is (locally) regular and asymptotically linear in the efficient influ- 
ence function where II of (4.10) is given by 

(6.31) II (z, t- 6) = h {z, t; 9) - (^{t) - e'^ f ^ dh) , 
with 

(6.32) 5,,9(t) = Fe^'e^^l[t,oo)(r), ^ = 0,1. 

A complete proof of efficiency in a strong sense is given in Klaassen (1989) 
under nondegeneracy and boundedness of Z. 

We are interested in estimating the baseline distribution function v{G) = 
G on an interval [0, Tq] with PciT > Tq) > 0. In view of this bounded window 
we will restrict Co to all paths at G in ^ with tangent h vanishing outside 
[0,To], yielding 

(6.33) g = {heLl{G):h = hl[o^To]}- 
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Furthermore, we will assume that \Z\ is bounded a.s. by C < oo. With the 
notation 

(6.34) ^,,e(t) = i?^^^V''^l[t,oo)(T), i = 0,l, 
we have 

(6.35) 5o,e(t) = ^5o,e(t) = 5i,e(t) - ~Sxfi{t)k{t). 

To verify (5.5) we note that for h^Q [cf. Example 6.7.1. A of Bickel, Klaassen, Ritov and Wellner 
(1993)] 

(6.36) = [Chij) - hdG^ /So4T) 

- (^Gh{s) - J^°° hdG^ / So,e{s) dA{s) 

holds. It follows from Example 3.5 in Schick (2001) that (2.25) holds for 
V'k = '4'h from (6.36) where h is associated with a b* corresponding to a signed 
measure q on [0,Tq] as in (6.40). Indeed, h = z>6*(x) = q{[x,To]) — J Gdfi 
holds, and 



G{t)h{t)-J h{s)dG{s) 

/oo 
fi{[s,To])dG{s)=0, t>To. 

This yields (5.5) and (5.6) with Ch{9) = Ee{h{X,T-6)iPhiX,T-9)). 

Given the regression parameter 9, the nonparametric maximum likelihood 
estimator Gg^n of the baseline distribution function G may be derived from 
the nonparametric maximum likelihood estimator of the baseline cumulative 
hazard function A, as described in Section 1 of Johansen (1983), and it equals 

(6.37) Ge,n(5) = l-exp|-f^l[o,.](ro(^X^^l[T,>T.]e''^^^ |, ^ > 0. 
Breslow [(1974), (7), page 93] proposed the Kaplan-Meier- type estimator 

(6.38) Ge,nis) = 1 " njl " 1^(7^0 l[T,>T,]e'^^^ |. 

Both these estimators are asymptotically linear in the efficient influence 
function 



V'1[o,,-GW(^'*;^) = ^2(^)(/2^(^)/2(^))''(1m(-)-G(5))(^,*) 

"^^'^^5o,e(t)^'''''"' ^ Jo So. 



1 Z"'^^* 1 

(6.39) =G{s)\——^l[o,s]it)-e'' — dA 
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uniformly in s G [0,To]; see Section 4 of Tsiatis (1981) and Example 6. 7.1. A 
of Bickel, Klaassen, Ritov and Wellner (1993). They are even weakly locally 
submodel efficient under the assumption of boundedness of Z for B the 
cadlag functions on [0,To] with supremum norm and 6* G Bq of the type 



(6.40) &*(^)= / Ks)dfi{s) 

J[0,To] 

for some finite signed measure /i. To verify this and for future use we need 
the following result. 

Lemma 6.2. If Ti, . . . ,Tn are random variables with empirical distribu- 
tion function Fn, then the statistic 

n / n \ ~1 

(6.41) K(s)=Elm<.] E1k>t,] 

i=i \j=i / 

satisfies 

(6.42) y„(s)<-log(l-F„(s)). 



Proof. With T{x) < 7^(2) <i • • • ^ ^^in) the order statistics we have 

n ^ nF„{s) ^ 

K(g) = Ei[Tw<^] ^_,^i = E 



. 'n — i + 1 ^ n — i + 1 

1=1 1=1 

rnFn{s)+l I 

< / —— dx = -\og{l-Fn{s)). □ 

Ji n + \ — X 



We also need the following convergence result. 
Lemma 6.3. Denote 



/ V —9 

n / 1 " \ 

J- \ ^ ^ „ flz / i 



(6.43) W„(t; e) = - E \T,>,^Z,e'^^ - E l[T,>*]e^^^ 



re " ^ J- J \ re . 

j=i \ ]=i 



In the Cox proportional hazards model of (6.28) with \Z\ bounded we have 
forO<s< To, 

(6.44) w^„(T,;e)lf / -i^(t)dA(t). 



i=l 



Proof. Conditionally, given Ti = t<s, the statistic Wn{t;6) converges 
in probability to Si^gSQg{t) where both Wn{t;9) and its limit are bounded 

a.s. Consequently, given Ti < s the difference \Wn{Ti;9) — Si^0SQg{Ti)\ con- 
verges in mean to and hence the lemma holds. □ 
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Combining these lemmata, we see that the nonparametric maximum hke- 
hhood estimator Gg^n{s) satisfies 

V^iGe^As) - G6,n{s)) + V^{9n - e)G{s) r ^ clA 

Jo J0,9 

1 " 



(6.45) 



n 1- , 

1=1 



+ G{s)l -^dA\dri 
Jo,e I 



9„ 1 



OpI^V^ a -JlMT.<s]\Wnm;r])-Wn{Ti;e)\dr]j +Op{1) 
Op(v^ ['"{r]-9)Vnis)dr])+Op{l)=Op{l) 



under 9, uniformly in s G [0, Tq]. Note that by (6.27), (6.30) and Lemma 2.2, 
1 " 



+ ^[On - e)G[t 



'-'0 



Si 



holds uniformly in s G [0,To]. The asymptotic linearity of Gg^n{s), (6.45) 
and (6.46) together imply that Ge,n(") is weakly locally submodel efficient 
on [0,To] in the sense of (5.1) with b* as in (6.40). Finally, note [cf. (6.45)] 



V^\GsJs)-G^Js)\<V^ 



(6.47) 



-J2MT^<s]WniTi■,r])dr] 



Je 



Vn{s). 



By Lemma 6.2 this yields (5.16) with b* G Bq as in (6.40), since 



(6.48) 



Pel Vn{s)d\fi\{s)>coe/C 



[0,To] 

<Pe(i^„(To)> 1-e- 



-(co£)/(|m|([0,To])C)> 



is arbitrarily small for C sufficiently small. We have proved that Theorem 5.2 
may be applied and that the full nonparametric maximum likelihood esti- 
mator Gg ^(s) of the baseline distribution function G is efficient if On is 
efficient. By similar arguments this may be shown also for Breslow's estima- 



tor' %,n(^)- 
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Proposition 6.3. Consider the Cox proportional hazards model of (6.28) 
with the covariate Z bounded a.s. in absolute value. If On is an efficient es- 
timator of the regression parameter 0, then both Gg ^(s) and Gg ^(s) are 
weakly efficient in estimating G{s) in the sense of (5.10) with b* G Bq as 
in (6.40). 
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